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SYSTEM AND METHOD FOR INTEGRATING HETEROGENEOUS 

INFORMATION 



Field of the Invention 

The present invention relates generally to information processing, and more 
particularly, to a system and method for integrating heterogeneous information. 

Identification of Copvright 

A portion of the disclosure of this patent document contains material which is 
subject to copyright protection. The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent disclosure, as it appears 
in the Patent and Trademark Office patent file or records, but otherwise reserves all 
copyright rights whatsoever. 

Description of the Background Art 

The key technological challenge in integrating heterogeneous information is that 
different types of information is analyzed using fundamentally different evaluators. 
Evaluators perform functions such as searching, prediction, collaborative filtering, data 
mining, and the like, and include such tools as search engines, neural networks, fuzzy 
logic-based decision makers, and a variety of other systems well known to those of 
skill in the art of information processing. 

For unstructured text, e.g. news and the like, evaluators based on information 
retrieval or search based mechanisms work best. For structured database data, 
evaluators based on data access mechanisms work best. Evaluation mechanisms such 
as searching, prediction, collaborative filtering, data mining, each have semantic 
differences. 



wo 99/27432 

Structured information is sennantically organized and stored in databases, such 
as relational databases and object oriented databases. Structured information is 
accessed by semantic retrieval mechanisms with explicit syntax and semantics like 
SQL, OSQL, and ODBC. The entire database industry including decision support, data 

5 access, warehousing, mining, and middleware is built on the ability to store and extract 
information stored in structured data sources. 

Unstructured information, on the other hand, is mostly composed of free form, 
natural language text, e.g. news articles, documents, messages and web pages. The 
mechanisms to analyze such information are keyword or concept based search and 

20 retrieval. Search engines return either too much irrelevant information or too little 
relevant information. 

Qualitative information is more intuitive in nature. Human responses and 
experiences such as intuition, market conditions, investment style, personality, 
preferences or ratings are examples of qualitative information. Techniques for 

15 evaluating such information are based on collaborative filtering, qualitative data 
analysis, at the like. 

Quantitative information is evaluated based on precise analytical and 
mathematical models and expert knowledge. Examples include econometric models 
in the finance industry for measuring risk and performance as well as predictive 
20 models. Examples of the latter include systems for predicting frauds and purchase 
patterns. In many cases this type of information is proprietary and is used within 
research departments of organizations. 

Unfortunately, conventional systems typically allow querying over only a single 
type of data. For example, SQL, a standard database query language, provides 
25 database access, but not access to unstructured information, qualitative information, or 
the like. Thus, multiple querying systems are required, increasing complexity and cost. 

Likewise, current systems perform only a single type of analysis in a single 
query. For example, typical analysis mechanisms such as text searches, data access, 
collaborative filtering, prediction, and the like are performed separately. Thus, 

2 
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multiple queries are required to achieve multiple forms of analysis, decreasing 
performance and increasing complexity and cost. 

Another disadvantage is that current systems are static. In other words, such 
systems generally cannot be dynamically extended to incorporate new types of 
5 analysis via new evaluators without a substantial and expensive overhaul. 

What is needed, then, is a system and method for integrating heterogeneous 
information in which querying may be performed over multiple types of data in a 
single query. What is also needed is system and method for integrating 
heterogeneous information that is dynamically extendible to incorporate new types of 
10 anaylsis. 

SUMMARY OF THE INVENTION 

The present invention overcomes the aforementioned problems by providing a 
25 Universal Analysis Language (UAL) and the Universal Analysis Model, which is used 
by the method and system described herein to query over multiple different types of 
information using multiple different mechanisms for analyzing and evaluating 
information. 

The information to be queried can be located in multiple sources, for example 
20 databases, news wires, text files, online (Web) pages, can be of multiple types, e.g. 
unstructured text, structured database data, quantitative (numeric) data or qualitative 
data. The mechanisms to analyze and evaluate such information (hereafter referred to 
as "evaluators") can be equally diverse, e.g. text search methods to evaluate text data, 
data analysis methods for structured data, collaborative filtering for qualitative data, 
25 prediction for quantitative data, etc. The present invention allows users to query 
multiple different types of information, using multiple different evaluation 
mechanisms in a single framework for decision making. 

In accordance with one aspect the present invention, a computer-implemented 
method for querying multiple different types of information, each type of information 
30 having a different e valuator, includes receiving a query comprising an identification of 

3 
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at least two evaluators, at least one relationship between the evaluators, and a method 
of combining results from the evaluators; parsing the query to create an evaluation 
sequence comprising an ordered sequence of invocations of the evaluators; invoking 
the evaluators in the evaluation sequence; and combirung results from the evaluators 
5 according to the method of combining results from the evaluators specified in the 
quefry. 

In another aspect of the invention, a system for querying multiple different 
types of information, each type of information having a different evaluator, includes a 
parser for receiving a query comprising an identification of at least two evaluators, at 

W least one relationship between the evaluators, and a method of combining results from 
the evaluators; means, coupled to the parser, for creating an evaluation sequence 
comprising an ordered sequence of invocations of the evaluators; means, coupled to 
the creating means, for invoking the evaluators in the evaluation sequence; and means, 
coupled to the invoking means, for combining results from the evaluators according to 

15 the method of combining results from the evaluators specified in the query. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a flow chart of a method for integrating heterogeneous information 
using a universal analysis language (UAL) in accordance with an embodiment of the 
20 present invention. 

Figure 2 is an illustration of a plurality of truth functions in accordance with an 
embodiment of the present invention. 

Figure 3 is a screen view of a query in accordance with an embodiment of the 
present invention. 

25 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Universal Analysis Model 

The Universal Analysis Model (UAM) is a means to integrate the above- 
described diverse evaluators within a common framework. The UAM achieves this 
30 integration by combining the results of analyzing various bodies of information within 

4 
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a single framework. In addition, the UAM must erisure that this integration is 
semantic, in that the results of various evaluators are combined correctly. By 
correctness, it is meant the logical correctness (or validity) of a query over disparate, 
heterogeneous, information using different evaluators 
5 A key aspect of the present invention is the Universal Analysis Language, 

including a model, i.e. a formal interpretation structure, for that language. The 
Universal Analysis Language universalizes the analysis of information from diverse 
evaluators. 

The declarative syntax of the language has two key characteristics. First, it 

W enables expressing the relationships between various evaluators as well as combining 
the results obtained from these evaluators. The language employs multi-valued logic, 
a generalization of fuzzy logic, and an interpreter. It is based on a generalization of 
dynamically interpreted infinite- valued logic. With this language one can express the 
relationship between various evaluators and their relationship to the validity of 

25 decision criteria given the underlying information. In addition, this language enables 
one to handle incorisistency and incompleteness in the underlying information, as well 
as find incomplete or inconsistent information. 

Second, the language is dynamically extensible. It includes a mechanism to add 
as well as execute new evaluation mechanisms (either native evaluators or third-party 

20 evaluators) on the fly. Dynamic extendibility is supported the architecture in a 
variety of ways. In one embodiment, a DLL-based execution mechanism called a 
"Module Manager" can be dynamically extended (i.e. a new evaluator can be added 
without recompiling or even restarting the system). This approach is primarily used 
in, for example, a Microsoft Windows^M environment. In another embodiment, a 

25 Platform Independent CORBA based execution mechanism that uses an Evaluator 
Invocation Repository (based on CORBA's DII - the Dynamic Invocation Interface) is 
used. 

The key idea of the language is to express the relationship between the results of 
various evaluators, and their relationship to the truth of decision criteria in a user 
30 query. In one embodiment, two primary methods for combining criteria include: (i) 

5 
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Statistical, based on weighted average, and (ii) Intuitive Logic, based on an extension of 
multi-valued logic (Similar to a generalized version of fuzzy logic) with preferences for 
individual criteria. 

The above-described system provides several distinct benefits and advantages. 
First, is the ability to query over multiple types of data. For example, the underlying 
data can be text, structured, qualitative, quantitative. Second, is the fact that multiple 
types of analyses can be combined in a single query. For instance, the underlying 
analysis mechanisms can be text search, data access, collaborative filtering, prediction, 
or other analyses, Third, is dynamically extensibility. New forms of analysis based on 
new evaluators can be dynamically added to the system without significant cost or 
disruption. Fourth, is the fact that importance values can be assigned to individual 
decision criteria. Each criteria in a decision can have an associated importance value. 
Fifth, is the ability to use partially specified decision criteria. Information in a Criteria 
can be specified imprecisely. Terms such as "High", "Low", and "Aggressive" may be 
used instead of precise, domain-specific quantities. 

Svstem Implementation 

The system and methods disclosed herein may be implemented using a general 
purposes computer, such as. an IBM PC or compatible machine. For example, 
embodiments of the present system may be implemented using the C+4- and Java 
programming languages and executed on such a computer. 

In one embodiment, the implementation infrastructure consists of an object 
oriented programming language, a lexical analyzer, and a parser. Typical language 
interpretation systems take one or more files as input. The contents of the input file are 
logically concatenated and made available to the lexical analyzer. The lexical analyzer 
analysis the input stream of characters and based on user defined regular expressions, 
outputs a sequence of tokens. The sequence of tokens is then made available to the 
parser. The parser uses the grammar defined by the programmer to analyze the input 
stream of tokens and build a parse tree called an abstract syntax tree. The abstract syntax 
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tree is then traversed to perform various operations depending on the needs of the 
language designer. 

Object oriented programming (OOP) technology allows encapsulation of both 
data and algorithms into Object classes. OOP allows programmers to associate methods 

5 (functions or procedures) and variables (which may themselves be instances of objects) 
with object classes. Object classes can inherit both behavior (i.e., methods) and data 
from parent super-classes. Classes that inherit properties from parent classes are called 
sub-classes. In typical OOP languages, a sub-class can be used in any context where its 
parent class can be used. Object instances are created just as basic primitive types like 

10 characters or integers are created in conventional languages. Object instances can be 
named and used just like any other variables. OOP languages provide special syntax 
to refer to object instance variable and methods. 

In one embodiment of the invention, there is defined a set of Object Classes and 
Special Functions for defining, using, and combining Criteria in useful ways. For 

15 example, one embodiment includes the following components: 

1. Parser: The parser takes text definitions for one or more Criteria as input. 
The input is parsed to create a set of Criteria Definition Objects - one for 
each input text definition. 

2. Object Manager: The object manager serves as a repository for criteria 
20 definition objects, values (such as literals and definitions), and the 

expressions used to evaluate values for individual Criteria. 

3. Module Manager: The module manager allows retrieval of external 
function objects given only the name of the function as input. This 
component also allows for functions and object classes to be dynamically 

25 added to the system. 

4. Virtual Table: The Virtual Table provides all the available candidate data. 
This component unifies access to candidate information that may come 
from various data sources and data types. 



7 
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5. Special Functions/Classes: These classes were developed for the end-user 
who will define Criteria. These functions and classes help to make the 
process of developing Criteria much more efficient and easy. 

Parser 

The parser accepts a stream of Criteria Definitions in text format and builds a set 
of Criteria Definition Objects. One aspect of the present invention includes a special 
language called Criteria-Script (UAL) that is tailored to make it extremely easy and 
efficient to define and use Criteria. 

Table 1 provides an overview of the principle classes that are used to build the 
UAL parser. 



Table 1: Implementation Classes for the UAL Parser 


Num 


Class Name 


Description 


1 


Parser 


Created for encapsulating the parser. This class provides 
a function called "parser" that takes one of more Criteria 
Definitions and returns a list of Criteria Definition 
Objects. The BNF (grammar) for UAL (i.e., the language 
the parser is built to recognize) is described below. 


2 


CriteriaDefinition 


Created to encapsulate a single Criteria Definition. This 
class has slots (i.e., attributes) to hold data structures for 
various information items like: (a) the name of the 
Criteria, (b) its return type (primarily fuzz-truth value or 
real), (c) parameter list for Criteria, and (d) a 
CriteriaValue object for each defined value. 



8 
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Table 1: Implementation Classes for the UAL Parser 


Num 


Class Name 


Description 


3 


CriteriaValue 


Created to encapsulate the definition for a single value in 
a Criteria Definition. Stores and provides access to the 
following information items: (a) name of value, (b) list of 
parameters to the value, and (c) the expression that 
defines how to evaluate (i.e., compute) the result for the 
value. 



To facilitate the definition and use of Criteria, a domain specific language called UAL 
was developed. UAL has the following important features: 

1 . UAL allows Criteria to be easily defined. Criteria can return fuzzy-truth 
values or any real (float) valued result. 

2. UAL provides conversation operators for mapping between real values 
and fuzzy-truth values. 

3. UAL provides special support for dealing with fuzzy truth-values, fuzzy- 
expressions (expressions using logical /arithmetic operators that return a 
fuzzy truth-value. 

4. UAL provides special support to access and manipulate information from 
the Virtual Table mechanisms (see below). 

5. UAL provides an efficient mechanism to add functions at run time that 
may then be immediately used to define new Criteria. 

6. UAL is easy to use and has simple intuitive semantics that are tailored to 
defining and combining Criteria results in a safe and efficient manner. 

Object Manager 

The object manager (OM) was developed to provide convenient storage and 
access to data structures (i.e. objects, literal constants, names, etc.) that are necessary to 
support the implementation mechanisms for defining and evaluating Criteria. The 
object manager's functionality is available through a application programming 
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interface (API) (i.e., behavior that can be invoked from external fiinctions, objects, 
procedures, etc.). Table 2 outlines the key methods supported by the Object manager. 



Table 2: Key Methods Supported by the Object Manager 


# 


Method 


Description 


1 




This method takes a^ list of CriteriaDefinition 
objects and stores every element of the list for 
subsequent access and processing. This method 
has been designed in a flexible manner. It can 
either accept the list of CriteriaDefinitions from the 
"parser" function or some other external 
mechanism (file, data base, other objects, etc.) that 
can server as a source for the list of Criteria 
definitions. 


2 


OBJMJRGetDefinition 


This method accepts a Criteria name as input and 
returns the corresponding CriteriaDefinition object. 


3 


OBJMJRGetCriteriaDef 


This method returns the definition expression - 
that is, the method returns the expression object (as 
a VT_Object - see below). The expression object is 
the object that encapsulates the expression that the 
end-user used to define the Criterion as a whole or 
any particular value definition. The method takes 
as input either, just the name of the Criterion, or 
the name of the Criterion and the name of the 
value. The method returns the corresponding 
expression object. 


4 


OBJMGRGetValueDefinition 


This method takes as input the name of the 
Criterion and the name of a value definition in that 
Criterion. This method returns the corresponding 
CriteriaValue object. | 



10 
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Module Manager 

The Module Manager allows functions to be dynamically added and retrieved 
5 to support the execution (i.e., interpretation) of the expressions contained in Criteria. 
The following classes and functions are used to support the capabilities of the module 
manager. Table 3 outlines the key sub-components needed to implement the module 
manager. 



Table 3: Key Methods Supported by the Module Manager 


# 


Class 


Description 


1 


VT_Functor 


This abstract class is used to define a common 
interface for encapsulating functions. All 
functions that need to be used in Criteria 
definitions are "wrapped" in the VT Functor 
class. In addition, the code that implements 
each function must be contained in a 
dynamically loadable code segment [i.e. 
Windows DLL, Java class, etc.]; the code 
segment must define one method named 
"newFunction" that takes as arguments the 
name of the function and a set of arguments 
and returns an instance of the function, which 
as mentioned above inherits front VT_Functor. 
The handle can then be used to call a function 
by name and use it in computing values for 
expressions. VT_Functor is a sub-class of 
VT3ase. See the description of VT_Table for 
more information on VT_Base. 




Function 


Description 



11 
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Table 3: Key Methods Supported by the Module Manager 


# 


Class 


Description 


1 


AddObjectlmplementation 


Takes the name for the object and executable in 
a dynamic load library format. This function 
returns no value but it adds the name and the 
library unit in the internal data structure 
maintained by the module manager component. 


2 


RemObjectlmplementation 


Removes both the object class name and the 
library executable from the internal data 
structure of the module manager component. 


3 


AddFunctionlmplementation 


This function takes a function name and the 
name of a code segment that can be 
dynamically loaded using whatever facilities 
the OS/Platform provides for this purpose 
(such as DLLs or Java Classes). This function 
stores the function name and its executable into 
the internal data structure maintained by the 
module manager system. 


4 


RemoveFunctionlmpiementation 


Removes both the function name and the 
library executable from the internal data 
structure of the module manager component 


5 


NewFunction 


Takes as input a valid function name that has 
been added to the module manager using the 
AddFunctionlmplementation function. This 
function also accepts as optional list of 
arguments. This fimction creates an instance of 
VT_Functor and returns a handle to the 
instance. The instance can be used to call and 
execute the function with its needed input 
arguments. 



12 
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Virtual Table 

5 The VT is designed to hold all candidate-related information. Because candidate 

information can come from multiple sources and be of multiple types the VT was 
designed as a table whose cells are object classes. The VT table can be accessed given a 
candidate identifier that must be unique - each row of the VT corresponds to one and 
only one candidate. The implementation of the VT is supported by the classes listed 

10 and described in Table 4. 



Table 4: Classes Related to Implementation of the Virtual Table 


# 


Class Name 


Description 


1 


CandidatePool 


This abstract class creates an abstraction that 
represents the entire set of candidates. This 
class allows the retrieval of a CandidateSet (see 
below). The CandidateSet that is needed can be 
retrieved by specifying members with a specific 
fields, whether or not null values are needed/ 
etc. The fields /columns of the table desired, 
and whether it's acceptable to skip rows that 
have Null values in any or all of the needed 
fields/columns needs to be specified. This 
interface /abstract class is implemented by the 
Candida teTest class. 


2 


CandidateSet 


This class delivers a sub-set of the entire pool of 
available candidates. CandidateSet provides 
capabilities to systematically iterate and access 
each candidate in a sequential manner through 
iteration. 



13 
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Table 4: Classes Related to Implementation of the Virtual Table 


# 


Class Name 


Description 


3 


VT_Base 


The base, class (i.e., the root) for all classes 
related to supporting the Virtual Table 
mechanism. VTJase is created to enforce 
consistent functionality of all sub-classes. In 
particular, there have been added methods like 
IsConstant, IsReference, etc., that can be used to 
find out important attributes of the objects. 


4 


VT Float32 


This [abstract] class/ interface irJierits from 
VT_Base and is created to define a standard 
interface to floating point numbers. Similarly 
there have been defined other classes to take 
care of common data types like integers 
(VT_INT32) and Strings (VT^String). 


5 


VT_Fuzzy 


A [abstract] class/ interface that inherits from 
VT_Base and defines a consistent interface to 
fuzz-truth values. Methods are provided to 
access and set fuzzy truth-values as well as to 
validate the notion a fuzzy-truth value - a real 
valued number between zero and one. 
VT_Fuzzy_Imp a standard implementation of 
VT_Fuzzy. The VT_Fuzzy interface has been 
used to define [the argument types accepted by] 
the fuzzy logical operators: And, Or, Not, and 
Equality. 



14 
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Table 4: Classes Related to Implementation of the Virtual Table 


# 


Class Name 


Description 


6 


VT^et 


An abstract interface to a set (i.e. collection) of 
VT objects. VT^Set inherits from both VT3ase 
and FilterOperand [see table 6], which allows 
any object implementing the VT_Set interface to 
act as the left hand side operand of the filter 
operator [see table 6]. VT_Set has methods 
defined to allow you to obtain an iterator that 
can iterate over the members of the set. 


7 


VTJterator 


An abstract interface to an iterator that allows 
iteration through a set of VT objects 


8 


VT_Composite 


The class VT_Composite is an abstract interface 
to a VT object that contains a set of named 
fields, each of which has a VT object value and 
can be individually retrieved by specifying the 
field's name. This class inherits from both 
VT_Base and SelectionOperand, which allows 
any object that implements VT_Composite to 
act as left hand side of the selection operator 
[see table 6]. 


7 


SelectionOperand 


See table 6. 


8 


FilterOperand 


See table 6. 



The VT table provides a consistent, well-defined, interface for the language execution 
mechanism to access data and information in flexible ways. 



5 Special Classes/Functions 
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In one embodiment of the present invention, two types of special Object Classes 
and Functions are provided: 

1 . A set of functions to make it easier to write and use Criteria definitions. 

2. A set of functions and classes to make the implementation of the system 
easier and more efficient. 

Table 5 lists the functions designed and developed for defining and using Criteria: 



Table 5: Functions that make Defining Criteria Easier 


# 


Function 


Description 


1 


PI 


This function allows users to convert a floating-point 
number into a fuzzy-truth value. The function takes 
a definition for a fuzzy curve whose y-axis 
corresponds to the truth- value between 0 and 1. For 
a particular point on the x-axis, the y-axis gives the 
corresponding truth-value. PI takes an object, 
generally a variable such as candidate.length, as 
input and the fuzzy-curve. This curve could be a 
trapezoid, defined by 4 points, or any arbitrary 
curve defined by n points, or via an equation that 
can be evaluated using the module manager. 
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Table 5: Functions that make Defining Criteria Easier 


# 


Function 


Description 


2 


And/Or/Not/Wimp 


And, Or, and Not are standard logical operations 
that can be applied to fuzzy truth-values as well as 
to Boolean (two-valued) truth-values. Wimp, an 
acronym for With Importance, is a special operator 
that can be used by end-users to weight specific 
Criteria more or less heavily than other Criteria. 
Wimp is a binary operator. Thus, given an 
expression like (wimp(al,wl),wimp(a2,w2)) the 
result would be (al*wl + a2*w2)/wl+w2). One 
interpretation of A WIMP B is logically equivalent to 
A OR NOT B. 


3 


>,<,>=, <=,=,!= 


Standard comparison operators. These operators 
produce a strict /boolean truth value, although the 
results is of type VT_Fuzzy. 


4 


StringSimilarity 


This function takes two input text strings and 
returns the extent to which the contents of the two 
strings are similar. The results for these functions 
are in the range [0,1]. 


5 


List selection operator to 
selecting a list of 
candidates 


See the filter operand/ Apply Filter stuff in table 6. 


6 


Average 


Average takes an input set and computes the 
average of the list. By using the module manager it 
is possible to dynamically load all other important 
statistical operators such as median, mode, max, 
min, and standard deviation, among others. 
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Table 6 lists additional classes and functions that helped to make the implementation 
easier and more efficient. 



Table 6: Classes/Functions to make Implementation Easier and Efficient 


# 


Class 


Description 


1 


VT_CriteriaUse 


An object class that facilitates the use of existing 
Criteria definitions to define new Criteria 
definitions. This class stores the name of the 
Criteria, the parameters it needs etc. so that it can be 
evaluated by using the module manager to fetch all 
needed fimctions to use the results of Criteria. 
When an object needs to be evaluated, the name and 
value name associated with that criteria are passed 
to the Object manager to obtain the expression that 
is used to evaluate that particular value. 


2 


VT_FilterParam 


This object encapsulates use of the universal 
Quantifier. For example in the expression 
("all --x in working set where (-x.length > 10) ") 
-X is said to be the filter parameter. It ranges over 
all the candidates in the working set (see Appendix 1 
for details of thee working set). ~x.length is the use 
of a filter parameter - specifically -x.length is bound 
to the length of the "current" candidate of the 
working set. The universally quantified variable -x 
is represented by a VT_FilterParam class. 
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Table 6: Classes/Functions to make Implementation Easier and Efficient 


# 


Class 


Description 


3 


VT_RowPIaceholder 


UAL expressions can use the special keyword 
"row", (e.g., working set.[row.Tumover > 100]) 
refers to the candidates whose Turnover is greater 
than 100. This fimction is identical to the function 
VT_FilterParam, except that its value is always 
defined by the innermost "for all" that encloses it — 
i.e. ws[row.length > 10] is identical to (all --row in ws 
where (-row.length > 10). 


4 


VT^WorkingSet 


This class holds the current working set. It 
implements the VT_Set interface. Each member of 
the set is of type VT_Composite, where the named 
fields of the VT_Composite object correspond to the 
named fields of the candidate. The current working 
set is defined to be some subset of the universe of 
candidates. The working set in lODF is defined as 
the set of candidates over which a criteria is 
evaluated. By contrast, the universal set, is the set of 
all the candidates available vi the virtual table. 


5 


SelectionOperand 


This is an abstract class /interface that must be 
implemented by any VT_Object that can act as the 
left hand side operand of the selection operator (the 
In "candidate.length"). This interface provides a 
means to get the result of applying the selection. 



19 



wo 99/27432 



PCT/US98/24711 



Table 6: Classes/Functions to make Implementation Easier and Efficient 


# 


Class 


Description 


6 


FilterODGrand 


This is an abstract class /interface that must be 
implemented by any VT_Object that can act as the 
left hand side of the selection operator, "[ For 
example, given the expression X[Y], X must 
implement this interface. This interface provides a 
means to get the result of applying the filter to X. 




Function 


Description 


7 


ApplySelection(X, Y) 


A helper function used to simplify parsing of UAL. 
UAL expressions of the form X.Y are parsed to 
ApplySelection(X, Y). This function attempts to 
convert argument X to an instance of 
SelectionOperand and then invoke the method 

rIofinoH in ^<3l/3/^HnnOni=>r?i"nH that vip1H<; thp rpsiilt 

applying selection Y on object X. Using the 
SelectionOperand interface in UAL, one can make 
selections like X.Y (i.e., select component Y) of X. In 
order for this to be meaningful, object X must 
implement the SelectionOperand interface. 


7 


ApplyFilter 


This function is used for filter expressions of the 
formX[Y] and (all -x in z where (y)) This is a helper 
function used to simplify parsing of UAL. UAL 
expressions of the form X[Y] are parsed to 
ApplyFilter(X,Y). This function attempts to convert 
argument X to an instance of FilterOperand and 
then invoke the method defined in FilterOperand 
that yields the result applying filter Y to object X. 
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5 Method of Operation 

Referring now to Figure 1, there is shown a flow chart of a method for 
integrating heterogeneous information using a universal analysis language (UAL). 
The method begins by creating 102 a requirement or query. A requirement is an 
10 expression in UAL. The requirement may be created using an editor or other means. 
Thereafter, the method continues by parsing 104 the requirement by means of the 
parser. 

After the requirement is parsed, the method continues by checking 106 or 
validating the results of the UAL parsing of the requirement. If the results are not 

15 correct, i.e. the requirement does not conform to the UAL grammar, the method is 
complete. Otherwise, the method continues by creating 108 an ordered sequence of 
evaluations, using the available evaluators, necessary to execute the requirement. 

After the ordered sequence is created, the method continues by invoking 110 the 
evaluators to execute the evaluation sequence as determined in step 108. Thereafter, 

20 the results obtained of the evaluators are combined 112 using the appropriate UAL 
logic to produce the results of the requirement. The result is an ordering of decision 
candidates for the given requirement. 

Operational Examples 

25 

Example Query 

The following example illustrates how a sample query is executed. The query is 
represented in the system as a logical combination of criteria. The intent of the query is 
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for a user to find mutual funds based on their criteria. In this case the criteria (with 
their values) are: 



I. 


One Year Return, 


High 


2. 


Risk Adjusted Return, 


Very High 


3. 


Analysis, 


Long Term Capital Appreciation 


4. 


Rating by people like me 


High 


5. 


Similar To, 


KAUFX 



The first step is to determine an optimal evaluation sequence for these criteria 
based on the meta information available. • The first criteria, for instance, uses the 
structured data access evaluator. The data relevant to One Year Return for each 
candidate is retrieved from the appropriate source and the evaluator compares this 
information to the definition of High. The structured data access evaluator accesses 
and evaluates structured database information. It takes a multivalued concept and 
returns the extent of the match between the concept and the actual information. 
"High" may be defined as "> 15%", in which case this comparison is trivial. 
Alternatively, it may be defined using truth functions such as those shown in Figure 2. 

Alternatively, High may mean a better One Year Return than the average One 
Year Return of all the Funds known. Alternatively, it may mean better performance 
than, say, the S&P 500 index. The internal definition of the concept can be arbitrarily 
complex. This evaluator analyzes the extent to which the given, actual information 
matches the desired value. 

For the criteria "long term capital appreciation", the specified evaluator may be 
a string similarity evaluator. This evaluator takes the argument string, and the source 
document for the candidate, which may be an analyst report, and returns an 
evaluation. This evaluation is converted into a truth value for this criterion, by looking 
up the rules for using this evaluator in the Universal Analysis Model. The criterion 
"Rating by people like me" employs a collaborative filter, such as Firefly or 
NetPerceptions' GroupLens. Similarly, depending on the execution sequence for the 
remaining criteria, each evaluation mechanism is invoked with the appropriate 
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parameters. Following this, the results from the evaluators are combined in 
accordance with their relationship as specified in the UAM, and the combined results, 
i.e. the ranked set of funds matching the query, are returned to the user. These results 
are documentable, so that a user is able to drill down into the rationale for the decision 

5 recommendation, all the way down to the actual information from the source. The 
sequence for executing a query is: (i) Determine an optimal execution sequence, (ii) 
Run the evaluators on the appropriate criteria, (iii) Use the various evaluators to 
convert the evaluators' results into the truth of the criteria, and (iv) Combine the 
results into a single value for each candidate, in this case mutual funds, given the entire 

10 query. 

To keep the explanation simple, two assumptions are made in this query which 
the UAL does not require: 

1, None of the criteria has any preference or importance, associated to it. 

2. Each criteria is combined with an implicit AND connective. 
25 Figure 3 illustrates a screen view of a query. 

Criterion Definition Example 1 

The example below defines a Criterion for One Year Return. The Criterion 
defines 4 values. All the values are evaluated with respect to a user's input. Thus a 

20 "good" one year return is between 0.75 to 1.25 times the input desired return with a 

drop-off of 10% of the desired value at the two extremes. Thus if the user input "10" as 
the input desired value, a good one year return is between 7.5% to 12.5%. Anything 
less than 6.5% is definitely not good and one year returns between 6.5% and 7.5% are 
good to some degree - "more good" as they approach 7.5%. In a similar manner, the 

25 definition includes other Criterion values such as "VeryGood", "Great", and 
"Extraordinary". 

OneYearRecurn (0desiredReturn : Float) : Fuzzy 
{ 

value Gcx3d ( ) ( Pi (candidate. 'One Year Return*. BdesiredReturn • 0.75, 

QdesiredReturn * 1.25. 
©desiredReturn * 0.1, 
@desiredReturn "0.1) ) 
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value VeryGood ; ) { Pi (candidace . 'One Year Return*. QdesiredRecurn • 1.35, 

SdesiredReturn • 1.75, 

@desiredReturn * 0.1, 

SdesiredRecurn * 0.1) ) 

value Great ( ) ( Pi (candidate. One Year Return', SdesiredReturn - 1.85. 

0desiredReturn * 2.5, 

QdesiredReturn • 0.1, 

OdesiredReturn ♦ 0.1) ) 

value Extraordinary t ) { candidate. 'One Year Return' > @desiredReturn ♦2.5 ) 



Criterion Definition Example 2 

In the following example, there is defined a Criterion that is selective in an 
inverse manner. It defines a Mediocre fund to be one whose "OneYearRetum" (the 
definition of Example 1 above) is "Good" or "Very Good" but not "Great" or 
"Extraordinary". 

Mediocre Fund ( ) : Fuzzy 

^ (OneYearReturn for {Average (working set. 'One Year Return')) is Good 

OR OneYearReturn for (Average (working set. 'One Year Return')) is VeryGood) 
AND NOT (OneYearReturn for (Average (working set. 'One Year Return')) is Great 

OR OneYearReturn for (Average (working set. 'One Year Return)) is Extraordinary) 
' ) ' 

Criterion Definition Example 3 

The following function uses text analysis to evaluate a Criterion. 
StringSimilarity returns a float number describes the degree to which the two input 
strings are similar. The result of StringSimilarity is fed to PI to return a fuzz result. 

SeeksCapitalAppreciation { ) : Fuzzy 
{ 

Pi (StringSim( candidate. Notes, "seeks capital appreciaiton" ) , 0.1. 1.0, 0.05, 0.0) 

> 
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Universal Analysis Languajge (UAL) Grammar 

The following is a BNF grammar for the Universal Analysis Language (UAL), 
according to one embodiment of the present invention. 



5 // 

/ / MACRO - contains definitions of conunon types 

/ / © 1998 by IBrain Software, Inc. 

%macro 

10 {squotel ; 

{float} •[0-9]+\.[0-9]+(e[\-+][0-9]+)?'; 

(truthValue) •[10]\.[0-9]+[Tt]'; 

{integer) •[0-9]+'; 

{simpleld) •[A-Za-z_][A-Za-z_0-9]*'; 

15 {complexld) ■{squote}(A-Za-z_0-91[A-Za-z_0-9 ]*{squote)'; 

. // 

// EXPRESSION - contains REGEXP definitions 

// 

20 %expression Main 





•[ \t\n]+' 


%ignore; 




'{simpleld) 1 {complexld}' 


ID; 




'\"['^\n\"]*\"' 


STRING_LITERAL; 


25 


•\c 


L_PAREN, '('; 




•\y 


R_PAREN, •)•; 




'\{' 


L BRACE, '{*; 




'\Y 


R_BRACE, •)•; 




■\[- 


L_BRACKET, '[•; 


30 


•\r 


R_BRACKET, ']■; 






COLON, :*; 






COMMA, 




'M- 


VERT_LINE, • 1 •; 




•&• 


AMPERSAND, *&*; 


35 


•\*' 


STAR, 




V 


SLASH, •/•; 




•\+" 


PLUS, ■+■; 




■>' 


GT, •>•; 




■<' 


LT, •<'; 


40 


•>=' 


GE, ■>=■; 




•<=• 


LE, '<='; 




'!=' 


NE, •!=•; 
EQUAL, •=•; 



25 
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MINUS, 






PERCENT, •%'; 




'#• 


POUND, '#'; 






EXCL, '!'; 


5 




DOLLAR, $; 




'—' 
•@' 


TILDA, 
AT, 




■\.' 


PERIOD, .; 




{ float! 


FLOAT_LITERAL; 


10 


{integer} 


INTEGER_LITERAL; 




(truthValue) 


TRUTH_LITERAL; 




'[Aa][Nn][Ddl' 


AND; 




•[Oo][Rr]' 


OR; 




•[Nnl[Oo][Tt]' 


NOT; 


15 


•[Ff][Ool[Rr]' 


FOR; 




'[Ii][Ss]' 


IS; 




•[Aa][Ll][Ll]' 


ALL; 




•[Ii][Nn]' 


IN; 




'[Ww][Hh][Ee][Rr][Ee]' 


MATT M T"' 

WHERE; 


20 


'value* 


VALUE; 




candidate' 


CANDIDATE; 




'working set' 


WORKING_SET; 




'universal set' 


UNTVERSAL.SET; 




'row' 


ROW; 


25 


'with importance" 


WITH_IMPORTANCE; 



/////////////////////////////////////////////////////////////// 
// 

30 // PRECEDENCE 

// 

%prec 

// Logical Ops (Lowest priority) 
35 1, '&*, %left; 
1, • I ', %left; 



2, WITHJMPORTANCE, %left; 

40 

// Logical NOT (higher than other logical ops) 

3, NOT, %right; 
3, •!', %right; 

45 I / Boolean comparison ops 

26 



wo 99/27432 PCTAJS98/24711 



4, •!=", 7oleft; 
4, •<•, %left; 
4, •>•, %left; 
4, •<=•, %left; 
5 .4/>=',%left; 

4, '=•, %left; 

// Additive Ops 

5, '+", %left; 
10 5, %left; 

// Multiplicative Ops 

6, %left; 
6, •/', %left; 

15 

/////////////////////////////////////////////////////////////// 
// 

// PRODUCTION ~ grammar productions with start symbol = start 
20 // 

%production start 

Startl start -> criteria_definition; 

Start2 start -> start criteria_definition; 

25 

// 

// Criteria Definitior\ 

// 

CritDef criteria_definition -> identifier '(' typed_param_list ')* ':' 

30 type_identifier criteria_body; 

// 

// Parameter Declaration List 

// 

35 TypedParamList typed_param_list -> parameter_decl ',* typed_param_list; 
TypedParamListl typed_param_list -> parameter_decl; 
TypedParamListO tjrped_param_list -> ; 



40 



ParameterDec parameter_decl -> identifier ':' type_identifier; 



// 

// Criteria Body 
// 

45 CriteriaBodyl criteria_body -> '{* value_definition_list ')'; 
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30 



// 

// Value Definition List 
// 

ValueDefList value_definition_list -> value_def value_definition_list; 
ValueDefListl value_definition_list -> value^def; 

ValueDef value^def -> VALUE identifier '(' typed_paramjist 'Y expression 



// 

// Expression 
25 // 

Expression! 

ExpressionAnd 

ExpressionOr 

20 // 

II Term 
// 

TermPlus 

TermMinus 
25 TermTimes 

TermDivideBy 

TermWIMP 

TermGT 

TermLT 
30 TermNE 

TermEQ 

TermLE 

TermGE 

TermNot 
35 Term 



expression -> term; 

expression -> expression and term; 

expression -> expression or term; 



term -> term term; 
term -> term term; 
term -> term term; 
term -> term V* term; 

term -> term WITHJMPORTANCE term; 

term -> term > term; 

term -> term '<' term; 

term -> term '!=' term; 

term -> term term; 

term -> term '<=' term; 

term -> term *>=' term; 

term -> not term; 

term -> literal; 



40 



// 

// Literal 

// 

//LiteralNot 

LiteralFormula 

LiteralExp 



literal -> not literal; 
literal -> formula; 
literal -> \ expression ')*; 



45 
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30 



15 



// 

// Formula 

// 

FormulaCrit 

FormulaFunc 

FormulaConst 

FormulaSelect 

FormulaParam 

// 

/ / Criteria Usage 
// 

CritUsel 
CritUseZ 
CritUse3 
CritUse4 
CritUseS 
CritUse6 



formula -> criteria_usage; 
formula -> function_usage;. 
formula -> constant_value; 
formula -> field_expr; 
formula -> '@* identifier; 



20 



criteria_usage -> identifier; 

criteria_usage -> identifier FOR argumentjist ')*; 
criteria_usage -> identifier IS identifier; 
criteria_usage -> identifier IS identifier *(' argumentjist ')'; 
criteria_usage -> identifier FOR argumentjist IS identifier; 
criteria^usage -> identifier FOR '(' argumentjist J 
IS identifier argumentjist J; 



/ 1 Field Expression - an expression that selects/retums/operates on fields 

25 // 

FieldExprSel field^expr -> field_base field^selection; 
FieldExprFunc field^expr -> field_base field_func; 
FieldExprlndex fieid_expr -> field_base field Jndex; 

30 FieldBaseCand field^ase -> CANDIDATE; 

FieldBaselD field_base -> ID; 

FieldBaseWS f ield^ase -> WORKING_SET; 

FieldBaseUS field.base -> UNIVERSAL.SET; 

FieldBaseFunc field^base -> function_usage; 
35 FieldBaseRow field_base -> ROW; 

FieldBaseParam field_base -> ID; 

FieldBaseExpr field.base -> field^expr; 

FieldBaseFilter field.base -> ALL ID IN WORKING.SET WHERE 
expression *)' ')*; 

40 

// 

/ / Field Terms 
// 

FieldTermSel field_selection -> identifier; 

45 FieldTermFuncl field_func -> *(' argumentjist ')'; 
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FieldTermlndexl field Jndex -> expression *]*; 
// 

/ / Function Usage 

// 

FuncUsage function_usage -> identifier argument Jist ')'; 



// 

20 // Argument List 
// 

ArgList argument_list -> argumentjist *; expression; 

ArgListl argumentjist -> expression; 

ArgListEmpty argument^list -> ; 

35 

// 

// Binary Operators 

// 

//BOPTimes binary_op -> '*'; 

20 //BOPDivide binary _op -> 7'; 

//BOPPlus binary^op -> '+'; 

//BOPMinus binary^op -> '-'; 

/ /BOPWithImp binary^op -> WITH IMPORTANCE; 

//BOPGT binary^op -> '>'; 

25 //BOPLT binary_op -> '<'; 

//BOPNE binary_op -> *!= ; 

// More binary ops to be defined 

30 // 

/ / Type Identifiers 

// 

Typeldentifier typejdentifier -> ID; // Need to add more flexible type ids 

35 

// 

/ / Constant Value 

// 

ConstFloat constant_value -> FLOAT^LITERAL; 

40 Constint constant_.value -> INTEGER_LITERAL; 

ConstString constant^value -> STRING^LITERAL; 

ConstTruth constant_value -> TRUTH^LITERAL; 



45 // 
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// Miscellaneous Productions 
// • 

Identifier identifier -> ID; 

Andl and -> 

And2 and -> AND; 

Orl or -> • I '; 

Or2 or -> OR; 

Notl not ->'!'; 

Not2 not -> NOT; 



The above description is included to illustrate the operation of the preferred 
embodiments and is not meant to limit the scope of the invention. The scope of the 
invention is to be limited only by the following claims. From the above discussion, 
many variations will be apparent to one skilled in the art that would yet be 
encompassed by the spirit and scope of the present invention. 

What is claimed is: 
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Claims 

1. A computer-implemented method for querying multiple different types 
of information, each type of information having a different evaluator, the method 

5 comprising: 

receiving a query comprising an identification of at least two evaluators, at least 
one relationship between the evaluators, and a method of combining 
results from the evaluators; 

parsing the query to create an evaluation sequence comprising an ordered 
10 sequence of invocations of the evaluators; 

invoking the evaluators in the evaluation sequence; and 

combining results from the evaluators according to the method of combining 
results from the evaluators specified in the query. 

2. The method of claim 1, wherein the parsing step includes the substep of: 
15 validating the query against a grammar; 

wherein the invoking and combining steps are performed only if the query is 
successfully validated. 

3. The method of claim 1, wherein the types of information are selected 
from the group consisting of structured information, unstructured information, 

20 qualitative information, and quantitative information. 

4. The method of claim 1, wherein the evaluators are configured to perform 
a function from the group consisting of searching, prediction, collaborative filtering, 
and data mining. 

5. A system for querying multiple different types of information, each type 
25 of information having a different evaluator, the system comprising: 

a parser for receiving a query comprising an identification of at least two 
evaluators, at least one relationship between the evaluators, and a 
method of combining results from the evaluators; 
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means, coupled to the parser, for creating an evaluation sequence comprising an 

ordered sequence of invocations of the evaluators; 
means, coupled to the creating means, for invoking the evaluators in the 

evaluation sequence; and 
5 means, coupled to the invoking means, for combining results from the 

evaluators according to the method of combining results from the 

evaluators specified in the query. 

6. The method of claim 5, the types of information are selected from the 
group consisting of structured information, unstructured information, qualitative 

10 information, and quantitative information. 

7. The method of claim 5, wherein the evaluators are configured to perform 
a function from the group consisting of searching, prediction, collaborative filtering, 
and data mining. 

8. A computer-readable medium having computer-readable program code 
15 modules embodied therein for querying multiple different types of information, each 

type of information having a different evaluator, the computer-readable medium 

comprising: 

computer-readable program code modules configured to receive a query 
comprising an identification of at least two evaluators, at least one 
20 relationship between the evaluators, and a method of combining results 

from the evaluators; 
computer-readable program code modules configured to parse the query to 
create an evaluation sequence comprising an ordered sequence of 
invocations of the evaluators; 
25 computer-readable program code modules configured to invoke the evaluators 

in the evaluation sequence; and 
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computer-readable program code modules configured to combine results from 
the evaluatbrs according to the method of combining results from the 
evaluators specified in the query. 
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